Tagging terms in text

نویسندگان

چکیده

Abstract As with many tasks in natural language processing, automatic term extraction (ATE) is increasingly approached as a machine learning problem. So far, most approaches to ATE broadly follow the traditional hybrid methodology, by first extracting list of unique candidate terms, and classifying these candidates based on predicted probability that they are valid terms. However, rise neural networks word embeddings, next development might be towards sequential approaches, i.e., each occurrence token within its original context. To test validity such for ATE, two methodologies were developed, evaluated, compared: one feature-based conditional random fields classifier embedding-based recurrent network. An additional comparison was added interpretation approach. All systems trained evaluated identical data multiple languages domains identify their respective strengths weaknesses. The proven network even outperformed more Interestingly, combination can outperform all them separately, showing new ways push state-of-the-art ATE.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Location Tagging in Text

Location tagging, also known as geotagging, is the process of assigning geographical coordinates to input data. In this project we present an algorithm for location tagging text. Our algorithm makes use of previous work in natural language processing by using a state-of-the-art part-of-speech tagger and named entity recognizer to find blocks of text which may refer to locations. A knowledge bas...

متن کامل

A Feature Terms based Method for Improving Text Summarization with Supervised POS Tagging

Text summarization is the process of distilling the most important information from a source to produce an abridged version for a particular user and task. When this is done by means of a computer, i.e. automatically, it calls as Automatic Text Summarization. Summarization can be classified into two approaches: extraction and abstraction. Extraction based summaries are produced by concatenating...

متن کامل

SentiTagger - Automatically Tagging Text in OpinionMining-ML

This paper presents SentiTagger, a research project proposal aiming at designing and implementing a computational system that automatically tag free text in OpinionMining-ML [1]. The latter is an XML-based formalism that has been proposed as a standard in the field of Sentiment Analysis.

متن کامل

Text Clustering using Semantic Terms

In traditional text clustering, documents appear terms frequency without considering the semantic information of each document (i.e., vector model). The property of vector model may be incorrectly classified documents into different clusters when documents of same cluster lack the shared terms. Recently, to overcome this problem uses knowledge based approaches. However, these approaches have an...

متن کامل

Associating Terms with Text Categories

Discriminating between text articles and automatically classifying documents is an essential task for many applications. With the prevalence of digital documents and the wide use of e-mail and web documents, text categorization is regaining interest and is becoming a central problem in digital text collections. There have been many approaches to solve this problem, mainly from the machine learn...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Terminology

سال: 2022

ISSN: ['0929-9971', '1569-9994']

DOI: https://doi.org/10.1075/term.21010.rig